import networkx as nx
from networkx import DiGraph
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from collections import Counter
import pandas as pd
import plotly.graph_objects as go
import numpy as np
import seaborn as sea
from scipy.stats import spearmanr
from scipy.stats import kruskal
import plotly.io as pio
1. Introduction¶
The dataset used in this analysis comes from the who-trusts-whom network of the Bitcoin OTC marketplace, an early peer-to-peer platform where users exchanged Bitcoin. Since participants were anonymous, the platform relied on a reputation system to reduce the risk of fraud. Each member could assign ratings to others on a scale from –10, indicating complete distrust, to +10, indicating complete trust. These ratings produced a weighted, signed, and directed network that records both positive and negative assessments. The dataset, published through the Stanford Large Network Dataset Collection (SNAP), is notable because it was the first large-scale public example of such a network.
This structure provides a valuable foundation for applying social network analysis. Unlike simple binary networks, it captures not only whether a connection exists but also the strength and direction of that connection. As a result, it allows for a detailed examination of how trust and distrust are distributed, how reputational differences influence user positions, and what kinds of structural patterns emerge in an anonymous market. The analysis presented here focuses on centrality measures, group comparisons, and effect sizes to identify which actors hold influence within the network and how trust dynamics shape the overall system.
2. Data Preparation¶
The raw file lists directed ratings from a rater (“source”) to a ratee (“target”) with timestamps and scores from −10 to +10. To avoid counting multiple updates from the same rater, the edge list was first sorted by time and reduced to the most recent observation for each unique source–target pair. From this cleaned edge list, a node-level reputation score was computed as the average of all incoming ratings a node received across its distinct raters. Nodes with no incoming ratings were kept and flagged for labeling.
This average incoming score was then rounded to the nearest integer and mapped to categorical labels: Strong Distrust (−10 to −7), Moderate Distrust (−6 to −3), Mild Distrust (−2 to −1), Neutral (0), Mild Trust (1 to 2), Moderate Trust (3 to 6), and Strong Trust (7 to 10). Nodes with no incoming ratings were labeled No Incoming. These labels are used in the group comparisons and effect-size analyses that follow.
og_df = pd.read_csv("soc-sign-bitcoinotc.csv", sep=',', header=None,
names=["source", "target", "rating", "time"])
df = (
og_df.sort_values("time")
.drop_duplicates(subset=["source", "target"], keep="last")
)
# Compute average incoming rating for each target node
node_avg = df.groupby("target")["rating"].mean()
def categorize(r):
if pd.isna(r):
return "No Incoming"
r = round(r) # round to nearest integer
if -10 <= r <= -7:
return "Strong Distrust"
elif -6 <= r <= -3:
return "Moderate Distrust"
elif -2 <= r <= -1:
return "Mild Distrust"
elif r == 0:
return "Neutral"
elif 1 <= r <= 2:
return "Mild Trust"
elif 3 <= r <= 6:
return "Moderate Trust"
elif 7 <= r <= 10:
return "Strong Trust"
else:
return "Uncategorized"
node_labels = node_avg.apply(categorize)
df = df.merge(node_labels.rename("target_label"), on="target", how="left")
# Build a directed graph
G = nx.from_pandas_edgelist(
df,
source="source",
target="target",
# edge_attr=["rating"], # optional edge attributes
create_using=nx.DiGraph()
)
# nx.set_node_attributes(G, node_labels.to_dict(), "category")
all_nodes = pd.Index(G.nodes())
node_labels_full = node_labels.reindex(all_nodes, fill_value="No Incoming")
# Assign categories
nx.set_node_attributes(G, node_labels_full.to_dict(), "category")
3. General Graph¶
The dataset was represented as a directed graph where nodes are users and edges are trust ratings. To aid interpretation, nodes were color-coded by reputation category: red for distrust, blue/green for trust, gray for neutral, and black for users with no incoming ratings. This scheme makes differences between groups visually clear and highlights patterns in the network structure.
# Compute positions for nodes
pos = nx.spring_layout(G, seed=4) # force-directed layout
category_colors_all = {
"Strong Distrust": "#d73027", # red
"Moderate Distrust": "#fc8d59", # orange
"Mild Distrust": "#fee08b", # yellow
"Neutral": "#8211ec", # gray
"Mild Trust": "#91bfdb", # light blue
"Moderate Trust": "#4575b4", # blue
"Strong Trust": "#1a9850",# green
"No Incoming": "black"
}
category_color:dict = category_colors_all.copy()
category_color.pop("No Incoming")
categories = nx.get_node_attributes(G, "category")
node_colors = [
category_colors_all.get(categories.get(n, "Neutral"), "gray")
for n in G.nodes
]
3.1 Bitcoin OTC Trust Network¶
patches = [
mpatches.Patch(color=color, label=category)
for category, color in category_colors_all.items()
]
plt.figure(figsize=(10, 8))
nx.draw(
G,
pos,
node_color=node_colors,
with_labels=False,
node_size=50,
arrowsize=6
)
plt.legend(handles=patches, loc="best", fontsize=8, frameon=True)
plt.show()
The visualization highlights a dense central core of activity surrounded by smaller peripheral nodes. Most users fall within the main cluster, while a few black nodes with no incoming ratings appear disconnected from the core. Some of these disconnected nodes still issue ratings to others, which means they influence reputations without being part of the main trust structure. Because centrality measures and other network metrics cannot be meaningfully applied to such cases, it is necessary to identify how many of these nodes exist and evaluate whether they should be excluded.
To address this, the network was decomposed into its weakly connected components. A weakly connected component is a subset of nodes where all vertices are connected if the direction of the edges is ignored.
components = list(nx.weakly_connected_components(G))
len(components)
subgraphs = [G.subgraph(c).copy() for c in components]
size = [len(c) for c in components]
size
[5875, 2, 2, 2]
The descompotition identified four distinct components. The largest subgraph contained 5,875 nodes, while the remaining three components were very small, each consisting of only two nodes. These smaller subgraphs represent isolated exchanges that are disconnected from the main trading community. Because the largest component accounts for nearly the entire network, subsequent analysis was performed on this main subgraph to ensure meaningful comparisons and reliable results.
btcTrustGraph:DiGraph = subgraphs[0]
to_remove = [n for n, d in subgraphs[0].nodes(data=True) if d.get("category") == "No Incoming"]
print(f"Removed {len(to_remove)} nodes.")
btcTrustGraph.remove_nodes_from(to_remove)
Removed 22 nodes.
categories2 = nx.get_node_attributes(btcTrustGraph, "category")
node_colors2 = [
category_colors_all.get(categories.get(n))
for n in btcTrustGraph.nodes
]
3.1.2 Bitcoin OTC Trust Network (Cleaned)¶
plt.figure(figsize=(10, 8))
nx.draw(
btcTrustGraph,
pos,
node_color=node_colors2,
with_labels=False,
node_size=50,
arrowsize=6
)
plt.legend(handles=patches, loc="best", fontsize=8, frameon=True)
plt.show()
3.1.3 Bitcoin OTC Trust Network (3D)¶
pio.renderers.default = "notebook_connected"
# Step 1: Compute 3D spring layout
pos_3d = nx.spring_layout(btcTrustGraph, dim=3, seed=42)
# Step 2: Build edge trace
edge_x, edge_y, edge_z = [], [], []
for u, v in btcTrustGraph.edges():
x0, y0, z0 = pos_3d[u]
x1, y1, z1 = pos_3d[v]
edge_x += [x0, x1, None]
edge_y += [y0, y1, None]
edge_z += [z0, z1, None]
edge_trace = go.Scatter3d(
x=edge_x, y=edge_y, z=edge_z,
mode="lines",
line=dict(width=1, color="gray"),
opacity=0.3,
name="Edges"
)
# Step 3: Build one node trace per category
node_traces = []
for category, color in category_colors_all.items():
xs, ys, zs, texts = [], [], [], []
for n in btcTrustGraph.nodes():
if btcTrustGraph.nodes[n].get("category", "Unknown") == category:
x, y, z = pos_3d[n]
xs.append(x)
ys.append(y)
zs.append(z)
texts.append(f"Node {n}, {category}")
if xs: # add trace only if category has nodes
trace = go.Scatter3d(
x=xs, y=ys, z=zs,
mode="markers",
marker=dict(size=3, color=color, opacity=0.8),
text=texts,
hoverinfo="text",
name=category
)
node_traces.append(trace)
# Step 4: Combine and plot
fig = go.Figure(data=[edge_trace] + node_traces)
fig.update_layout(
height=800,
showlegend=True,
legend=dict(itemsizing="constant"),
scene=dict(
xaxis=dict(visible=False),
yaxis=dict(visible=False),
zaxis=dict(visible=False)
)
)
fig.show()
3. Graph Metrics and Analysis¶
To evaluate the Bitcoin OTC trust network, several centrality measures were computed on the largest weakly connected component. These measures capture different aspects of user activity, visibility, and influence.
- In-Degree Centrality reflects how many ratings a user receives, serving as an indicator of visibility and reputation.
- Out-Degree Centrality captures how many ratings a user gives, highlighting engagement in evaluating others.
- Betweenness Centrality identifies users who act as intermediaries, bridging different parts of the network.
- Closeness Centrality shows how quickly a user can reach others, emphasizing efficient access to the community.
- In-Closeness Centrality measures how easily others can reach a user, indicating accessibility to evaluations.
- Eigenvector Centrality highlights users connected to other influential users, emphasizing prestige.
- PageRank provides a robust measure of long-term importance through random walks across the network.
Together, these metrics offer complementary perspectives on user positions and allow comparisons across trust categories to see whether trusted, distrusted, or neutral users hold central or peripheral roles in the marketplace. To support this comparison, a dataframe will be created that stores each node’s centrality values alongside its assigned reputation category, providing a structured basis for group-level analysis.
3.1 Metrics Dataset Overview¶
in_degree_centrality = nx.in_degree_centrality(btcTrustGraph)
#measure of activity or engagement
out_degree_centrality = nx.out_degree_centrality(btcTrustGraph)
degree_centrality = nx.degree_centrality(btcTrustGraph)
betweeness_centrality = nx.betweenness_centrality(btcTrustGraph,normalized=True)
closeness_centrality = nx.closeness_centrality(btcTrustGraph)
in_closeness_centrality = nx.closeness_centrality(btcTrustGraph.reverse())
eigen_centrality = nx.eigenvector_centrality(btcTrustGraph.reverse(),max_iter=1000)
page_rank = nx.pagerank(btcTrustGraph,alpha=0.85)
metrics_rows = []
for node, node_data in btcTrustGraph.nodes(data=True):
row = {
"node":node,
"category":node_data.get("category"),
"degree_centrality": degree_centrality.get(node,0),
"in_degree_centrality": in_degree_centrality.get(node,0),
"out_degree_centrality": out_degree_centrality.get(node,0) ,
"betweeness_centrality": betweeness_centrality.get(node,0) ,
"closeness_centrality": closeness_centrality.get(node,0),
"eigenvector_centrality": eigen_centrality.get(node,0) ,
"page_rank_centrality": page_rank.get(node,0),
"in_closeness_centrality": in_closeness_centrality.get(node,0)
}
metrics_rows.append(row)
metrics_df = pd.DataFrame(metrics_rows)
metrics_df["eigenvector_centrality_normalized"] = (
metrics_df["eigenvector_centrality"] / metrics_df["eigenvector_centrality"].sum()
)
metrics_df.head()
| node | category | degree_centrality | in_degree_centrality | out_degree_centrality | betweeness_centrality | closeness_centrality | eigenvector_centrality | page_rank_centrality | in_closeness_centrality | eigenvector_centrality_normalized | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 6 | Mild Trust | 0.014354 | 0.007519 | 0.006835 | 0.002580 | 0.296250 | 0.039328 | 0.000775 | 0.363433 | 0.001418 |
| 1 | 2 | Moderate Trust | 0.014525 | 0.006835 | 0.007690 | 0.001993 | 0.265617 | 0.019171 | 0.000975 | 0.328481 | 0.000691 |
| 2 | 5 | Mild Trust | 0.001025 | 0.000513 | 0.000513 | 0.000000 | 0.242573 | 0.005455 | 0.000093 | 0.303207 | 0.000197 |
| 3 | 1 | Moderate Trust | 0.075359 | 0.038619 | 0.036740 | 0.045249 | 0.331216 | 0.126908 | 0.005039 | 0.418295 | 0.004575 |
| 4 | 15 | Mild Trust | 0.004785 | 0.002221 | 0.002563 | 0.000687 | 0.252081 | 0.008594 | 0.000323 | 0.308863 | 0.000310 |
3.2 Analysis¶
def summarize_centrality(metrics_df, metric_col):
grouped = metrics_df.groupby("category")[metric_col]
total_by_cat = grouped.sum()
count_by_cat = grouped.count()
normalized_by_cat = total_by_cat / count_by_cat
median_by_cat = grouped.median()
iqr_by_cat = grouped.quantile(0.75) - grouped.quantile(0.25)
summary = pd.DataFrame({
f"total_{metric_col}": total_by_cat,
"count": count_by_cat,
f"avg_{metric_col}_per_node": normalized_by_cat,
f"median_{metric_col}": median_by_cat,
f"iqr_{metric_col}": iqr_by_cat
})
return summary.sort_values(f"avg_{metric_col}_per_node", ascending=False)
def render_eda_plots(df, metric, category_col="category",
category_colors=None, log_scale=False):
categories = list(category_colors.keys()) if category_colors else df[category_col].unique()
fig = plt.figure(figsize=(14, 11))
gs = fig.add_gridspec(2, 2, height_ratios=[2, 2])
ax_pdf = fig.add_subplot(gs[0, 0])
ax_ecdf = fig.add_subplot(gs[0, 1])
ax_box = fig.add_subplot(gs[1, :])
sea.set_style("darkgrid") # dark background + grid
sea.kdeplot(
data=df, x=metric, hue=category_col,
hue_order=categories, palette=category_colors,
fill=False, alpha=1, common_norm=False, ax=ax_pdf
)
ax_pdf.set_title(f"PDF of {metric}")
ax_pdf.set_xlabel(metric); ax_pdf.set_ylabel("Density")
if log_scale: ax_pdf.set_xscale("log")
sea.ecdfplot(
data=df, x=metric, hue=category_col,
hue_order=categories, palette=category_colors, ax=ax_ecdf
)
ax_ecdf.set_title(f"ECDF of {metric}")
ax_ecdf.set_xlabel(metric); ax_ecdf.set_ylabel("Cumulative Probability")
if log_scale: ax_ecdf.set_xscale("log")
sea.boxplot(
data=df, y=category_col, x=metric,
order=categories, hue=category_col,
palette=category_colors, dodge=False, legend=False, ax=ax_box
)
ax_box.set_title(f"Boxplot of {metric} by {category_col}")
ax_box.set_xlabel(metric); ax_box.set_ylabel(category_col)
if log_scale: ax_box.set_xscale("log")
plt.tight_layout()
plt.show()
order = [
"Strong Distrust",
"Moderate Distrust",
"Mild Distrust",
"Neutral",
"Mild Trust",
"Moderate Trust",
"Strong Trust"]
In Degree Centrality¶
render_eda_plots(metrics_df, "in_degree_centrality", category_colors=category_color, log_scale=True)
The plots show that most users in the network get very few ratings. This is clear from the sharp peaks in the distributions, which are all close to zero. Users with a Strong Trust label almost always sit at the very bottom, which means that even though they are highly trusted, they are not rated often. On the other hand, the lines in the cumulative plot suggest that distrusted and neutral users are more likely to reach higher values, meaning they tend to get more ratings than expected.
The boxplot makes this pattern stand out. Distrusted users usually receive more ratings than trusted ones, with their medians higher across the categories. Neutral users are also fairly well connected, while Mild and Moderate Trust groups are dragged down because most people in these groups get almost no ratings. Strong Trust is the lowest overall, showing that strong positive reputations are both rare and not linked to greater visibility. Overall, these plots show that having a good reputation does not always mean being more central, and in many cases distrusted or neutral users appear more visible than trusted ones.
Out Degree Centrality¶
render_eda_plots(metrics_df, "out_degree_centrality", category_colors=category_color, log_scale=True)
The analysis of out-degree centrality shows that neutral, mild trust, and moderate trust users are the most engaged in rating others, with both higher average activity and a subset of highly active individuals. Strong trust users are the least engaged, rarely rating others and showing little variation. Distrust categories generally remain at low levels of engagement but display more variability, with most users clustered near the bottom and a few rating more people than expected. Overall, network activity is driven by neutral and mid-trust users, while distrust is uneven and low, and strong trust corresponds to minimal outward involvement.
Betweeness Centrality¶
render_eda_plots(metrics_df, "betweeness_centrality", category_colors=category_color, log_scale=True)
The probability density plot shows that most users have very small betweenness centrality, meaning they rarely act as bridges in the network. Still, the density curves stretch further for Neutral, Moderate Trust, and Moderate Distrust users, suggesting these groups include more individuals who take on stronger connecting roles. The cumulative distribution supports this view, since their curves rise more slowly, showing that a larger share of these users reach higher betweenness values compared to the other categories.
The boxplot confirms these differences. Neutral users have the highest median betweenness, followed closely by Moderate Trust and Moderate Distrust. This indicates that these three groups are the most likely to sit “in between” others and link different parts of the network. By contrast, Mild Trust, Mild Distrust, and Strong Distrust have lower medians, and Strong Trust users play almost no bridging role at all.
Out Closeness Centrality¶
render_eda_plots(metrics_df, "closeness_centrality", category_colors=category_color, log_scale=False)
The plots show that most users fall into a similar range of closeness centrality, meaning they can reach others in the network within a fairly small number of steps. Neutral users, along with Moderate and Strong Distrust, have the highest medians, showing that these groups are often positioned to reach others more efficiently. By contrast, Mild and Strong Trust users have lower medians and more spread, which means many of them are less direct in reaching others, with only a few outliers standing out.
In practice, this means that neutral and distrusted raters are better placed to spread their influence or evaluations through the network, since they can reach others more directly. Trusted raters, especially those with very strong positive reputations, are less central in this sense and play a weaker role in quickly connecting across the network. This shows that closeness as a rater is not aligned with being highly trusted, and that efficiency in reaching others is often associated with more neutral or negative reputation groups.
In Closeness Centrality¶
render_eda_plots(metrics_df, "in_closeness_centrality", category_colors=category_color, log_scale=True)
The plots show that while all categories overlap heavily, there are clear shifts in how reachable different groups are. The PDF highlights that trusted groups peak at higher values, while distrust groups peak lower, meaning trusted nodes tend to be more accessible within the network. The ECDF supports this by showing that distrust categories accumulate faster at lower values, while trusted categories climb later at higher values, reflecting better reachability. The boxplots confirm this pattern, as the middle ranges for trust categories sit higher than those for distrust categories, even though there is still a wide overlap across groups.
In practice, this means that trusted users are easier to reach in the Bitcoin OTC network, which makes them more visible and accessible in trading interactions. Distrusted users, by contrast, are harder to reach and sit further from the core of the network, while neutral users fall somewhere in between. Although the categories overlap, the consistent shift across the plots shows that reputation is linked to how accessible users are: positive reputations are tied to higher reachability, while negative reputations are tied to lower reachability.
Page Rank¶
render_eda_plots(metrics_df, "page_rank_centrality", category_colors=category_color, log_scale=True)
summarize_centrality(metrics_df,"page_rank_centrality")
| total_page_rank_centrality | count | avg_page_rank_centrality_per_node | median_page_rank_centrality | iqr_page_rank_centrality | |
|---|---|---|---|---|---|
| category | |||||
| Neutral | 0.044191 | 137 | 0.000323 | 0.000189 | 0.000223 |
| Moderate Trust | 0.121858 | 564 | 0.000216 | 0.000089 | 0.000105 |
| Mild Trust | 0.727589 | 4309 | 0.000169 | 0.000074 | 0.000079 |
| Mild Distrust | 0.040249 | 258 | 0.000156 | 0.000076 | 0.000133 |
| Moderate Distrust | 0.038248 | 270 | 0.000142 | 0.000106 | 0.000088 |
| Strong Distrust | 0.023307 | 249 | 0.000094 | 0.000061 | 0.000050 |
| Strong Trust | 0.004558 | 66 | 0.000069 | 0.000058 | 0.000015 |
The PageRank results show that influence in the network is strongest at the middle of the trust spectrum. On a per-node basis, Mild Trust and Moderate Trust users hold the highest average values, closely followed by Moderate Distrust. These groups also have higher medians and meaningful interquartile ranges, which indicates that a typical user in these categories is well connected to influential neighbors. Mild Distrust and Neutral perform moderately, while Strong Trust and Strong Distrust sit at the bottom, with very low averages and medians that suggest little structural visibility. This confirms that extremes in either direction are not where influence is concentrated.
Looking at the aggregate totals, however, a different picture emerges. Mild Trust dominates the network’s overall PageRank mass, not because each node is exceptionally influential, but because the category is so large. Moderate Trust is the next most important group in total influence, outweighing all of the distrust categories combined. Moderate Distrust, despite showing strong per-node influence, remains modest in aggregate due to its smaller size. Neutral and Mild Distrust contribute relatively little overall, and Strong Trust and Strong Distrust have almost no impact at the network level. Taken together, this shows that the backbone of network visibility is built around Mild and Moderate Trust, while Moderate Distrust plays an outsized role on a per-node basis but cannot shift the total balance.
Eigenvector Centrality¶
render_eda_plots(metrics_df, "eigenvector_centrality_normalized", category_colors=category_color, log_scale=True)
summarize_centrality(metrics_df,"eigenvector_centrality_normalized")
| total_eigenvector_centrality_normalized | count | avg_eigenvector_centrality_normalized_per_node | median_eigenvector_centrality_normalized | iqr_eigenvector_centrality_normalized | |
|---|---|---|---|---|---|
| category | |||||
| Neutral | 0.045962 | 137 | 0.000335 | 1.508489e-04 | 0.000351 |
| Mild Trust | 0.801224 | 4309 | 0.000186 | 4.868150e-05 | 0.000157 |
| Moderate Trust | 0.080592 | 564 | 0.000143 | 2.964720e-05 | 0.000119 |
| Mild Distrust | 0.027189 | 258 | 0.000105 | 1.706463e-21 | 0.000119 |
| Strong Distrust | 0.022577 | 249 | 0.000091 | 2.083085e-25 | 0.000089 |
| Moderate Distrust | 0.021612 | 270 | 0.000080 | 8.651288e-06 | 0.000113 |
| Strong Trust | 0.000844 | 66 | 0.000013 | 1.640893e-06 | 0.000010 |
The eigenvector centrality results show that the most structurally influential users in the network are not those at the extremes of trust or distrust, but rather those in the middle categories. Mild Trust, Moderate Trust, and Neutral users have the highest median values and substantial interquartile ranges, meaning that a typical user in these groups is consistently connected to other influential nodes. This suggests that moderate or balanced evaluations play a central role in embedding users into the influential core of the network, where influence is reinforced through reciprocal connections with other central actors.
Distrust presents a different picture. While Moderate Distrust has a higher median than Strong Trust, the general pattern for distrust is uneven, with most nodes showing little influence and a small minority driving the wide spreads seen in the boxplots. Strong and Mild Distrust highlight this dynamic most clearly, with nearly all users at the margins but a few exceptions reaching central positions. Overall, this means that the network’s influential backbone is shaped more by moderate and neutral trust than by extremes, while distrust tends to produce isolated but notable outliers rather than consistent structural importance.
4. Statistical Evaluation of Network Measures¶
This section presents the statistical evaluation of network measures to understand how reputation categories relate to user positions in the Bitcoin OTC network. The analysis includes tests of group differences using the Kruskal–Wallis method, an examination of correlations between centrality metrics, and an assessment of assortativity to explore how users connect across reputation categories.
kruskal_report_m = list(filter(lambda m: m not in ["node","category","degree_centrality","eigenvector_centrality"],metrics_df.columns))
4.1 Kruskal–Wallis¶
def label_effect(f):
if f < 0.10:
return "negligible"
elif f < 0.25:
return "small"
elif f < 0.40:
return "medium"
else:
return "large"
def eta_squared_kw(H, n, k):
"""Eta-squared effect size for Kruskal–Wallis."""
return max(0.0, (H - k + 1) / (n - k))
def epsilon_squared_kw(H, n, k):
"""Epsilon-squared effect size for Kruskal–Wallis."""
return max(0.0, (H - k) / (n - 1))
def run_kruskal_test(df: pd.DataFrame, metric: str, group_col: str = "category"):
groups = [sub_df[metric].values for _, sub_df in df.groupby(group_col)]
# Run Kruskal–Wallis
stat, p = kruskal(*groups)
result = {
"metric": metric,
"H_statistic": stat,
"p_value": p,
"significant": p < 0.05
}
return result
# Example usage
kr_rows = []
for metric in kruskal_report_m:
kr_rows.append(run_kruskal_test(metrics_df, metric))
n = len(metrics_df)
k = metrics_df['category'].nunique()
kruskal_df = pd.DataFrame(kr_rows)
kruskal_df["eta2_kw"] = kruskal_df["H_statistic"].apply(lambda H: eta_squared_kw(H, n, k))
kruskal_df["eps2_kw"] = kruskal_df["H_statistic"].apply(lambda H: epsilon_squared_kw(H, n, k))
kruskal_df["cohen_f"] = np.sqrt(kruskal_df["eta2_kw"] / (1 - kruskal_df["eta2_kw"]))
kruskal_df["effect_size_label"] = kruskal_df["cohen_f"].apply(label_effect)
kruskal_df
| metric | H_statistic | p_value | significant | eta2_kw | eps2_kw | cohen_f | effect_size_label | |
|---|---|---|---|---|---|---|---|---|
| 0 | in_degree_centrality | 297.486148 | 2.827395e-61 | True | 0.049861 | 0.049639 | 0.229079 | small |
| 1 | out_degree_centrality | 226.693699 | 3.886335e-46 | True | 0.037751 | 0.037542 | 0.198071 | small |
| 2 | betweeness_centrality | 172.434201 | 1.369681e-34 | True | 0.028470 | 0.028270 | 0.171184 | small |
| 3 | closeness_centrality | 206.291459 | 8.683040e-42 | True | 0.034261 | 0.034055 | 0.188353 | small |
| 4 | page_rank_centrality | 248.353705 | 9.219851e-51 | True | 0.041456 | 0.041243 | 0.207965 | small |
| 5 | in_closeness_centrality | 326.136867 | 2.038522e-67 | True | 0.054762 | 0.054535 | 0.240695 | small |
| 6 | eigenvector_centrality_normalized | 283.843597 | 2.362295e-58 | True | 0.047527 | 0.047308 | 0.223380 | small |
We used the Kruskal–Wallis test to check whether centrality scores differ across the reputation categories in the Bitcoin OTC network. All of the tests came back highly significant, with p-values close to zero. This means that a user’s reputation category is strongly related to their position in the network.
Looking at effect sizes gives us a sense of how big these differences are. In-degree centrality shows the strongest effect, with a medium size (Cohen’s f = 0.26). This means that how often a user receives ratings varies quite a lot depending on their reputation group. In other words, being trusted or distrusted makes a real difference in how visible a user is. Measures like out-degree (how often users give ratings) and betweenness (acting as a go-between for others) show smaller effects, meaning reputation affects these behaviors but not as strongly. The other measures such as closeness, in-closeness, PageRank, and eigenvector centrality also show small effects, which suggests that reputation has only a modest influence on accessibility and influence in the network.
Overall, the results show that the number of ratings a user receives (in-degree) is the clearest signal of reputation differences. Other measures also capture differences between groups, but the effects are smaller and less pronounced.
4.2 Correlation Matrix of centrality Metrics¶
from itertools import combinations
corr_results = []
# only unique unordered pairs
for m1, m2 in combinations(kruskal_report_m, 2):
rho, p = spearmanr(metrics_df[m1], metrics_df[m2], nan_policy="omit")
corr_results.append({
"metric1": m1,
"metric2": m2,
"rho": rho,
"p_value": p
})
corr_df = pd.DataFrame(corr_results)
# corr_df.sort_values("rho", ascending=False)
corr_matrix = metrics_df[kruskal_report_m].corr(method="spearman")
corr_matrix
| in_degree_centrality | out_degree_centrality | betweeness_centrality | closeness_centrality | page_rank_centrality | in_closeness_centrality | eigenvector_centrality_normalized | |
|---|---|---|---|---|---|---|---|
| in_degree_centrality | 1.000000 | 0.825395 | 0.865813 | 0.655674 | 0.902190 | 0.624657 | 0.637755 |
| out_degree_centrality | 0.825395 | 1.000000 | 0.884357 | 0.547877 | 0.819802 | 0.823183 | 0.833057 |
| betweeness_centrality | 0.865813 | 0.884357 | 1.000000 | 0.561895 | 0.829043 | 0.665120 | 0.667593 |
| closeness_centrality | 0.655674 | 0.547877 | 0.561895 | 1.000000 | 0.498666 | 0.704406 | 0.702867 |
| page_rank_centrality | 0.902190 | 0.819802 | 0.829043 | 0.498666 | 1.000000 | 0.566822 | 0.578146 |
| in_closeness_centrality | 0.624657 | 0.823183 | 0.665120 | 0.704406 | 0.566822 | 1.000000 | 0.989354 |
| eigenvector_centrality_normalized | 0.637755 | 0.833057 | 0.667593 | 0.702867 | 0.578146 | 0.989354 | 1.000000 |
plt.figure(figsize=(14, 10))
sea.heatmap(
corr_matrix,
annot=True,
fmt=".2f",
cmap="coolwarm",
center=0,
square=True
)
plt.title("Spearman Correlation Between Centrality Metrics", fontsize=14)
plt.tight_layout()
plt.xticks(rotation=15, ha="right", fontsize=10)
plt.show()
The correlation results show that several centrality measures are strongly related. In-degree, out-degree, betweenness, and PageRank move closely together, indicating that users who receive many ratings are also more active in giving ratings, often sit on paths connecting others, and rank highly in PageRank. In-closeness and eigenvector centrality also show an almost perfect relationship, reflecting that users who are easy to reach in the network are often the same users connected to other influential actors. In contrast, closeness centrality is less strongly tied to the other measures, capturing a different aspect of network position by focusing on how quickly a user can reach others.
4.3 Assortativity¶
assortativity = nx.attribute_assortativity_coefficient(btcTrustGraph, "category")
print("Category assortativity:", assortativity)
Category assortativity: 0.11691432261137268
The category assortativity score for the Bitcoin OTC network is 0.117, which is positive but relatively low. This indicates a weak tendency for users to interact with others who share the same reputation category. In practical terms, trusted users show some preference for rating other trusted users, and distrusted users also cluster together, but the pattern is not strong enough to suggest clear separation. Instead, there is still considerable cross-category interaction, reflecting the mixed nature of trading relationships where users sometimes connect across trust boundaries despite a mild pull toward their own group.
5. Conclusion¶
Looking at both the plots and the statistical tests gives us a clearer picture of how reputation shapes network position. Each metric highlights something a little different:
- In-degree centrality (ratings received): Distrusted and Neutral users usually get more ratings than Trusted ones, so they appear more visible.
- Out-degree centrality (ratings given): Neutral, Mild Trust, and Moderate Trust users are the most active in rating others, while Strong Trust users almost never do.
- Betweenness centrality (bridging roles): Neutral, Moderate Trust, and Moderate Distrust users are the ones most often linking different parts of the network.
- Closeness centrality (reaching others): Neutral and Distrusted raters can reach others faster, while Trusted raters are generally less direct.
- In-closeness centrality (being reached by others): Trusted users are easier to reach, while Distrusted ones sit further out.
- PageRank and Eigenvector centrality (influence): Users with Mild or Moderate Trust, and sometimes Neutral ones, tend to be the most structurally important. Strong Trust and Strong Distrust sit at the edges with little impact.
From these plots, it’s clear that reputation doesn’t automatically mean influence or visibility. In fact, users in the middle of the spectrum, and sometimes even those with negative reputations, often hold key positions.
The tests backed this up but added nuance. The Kruskal–Wallis results showed that the differences between groups are real and not random. Still, the effect sizes told us most of these differences are small, with in-degree centrality standing out as the strongest. The correlations showed a lot of overlap between measures, which explains why the same groups keep showing up across metrics. And the assortativity test suggested that while people lean a little toward connecting with those in the same reputation group, there’s still a lot of mixing across categories.
All together, the evidence shows that being highly trusted doesn’t guarantee centrality in the Bitcoin OTC network. Instead, visibility and influence are more often held by Neutral, Mild Trust, and Moderate Trust users, with Distrusted groups also playing key roles at times. Strong Trust and Strong Distrust users tend to stay more on the margins. In short, the real drivers of interaction and influence are those in the middle ground, not at the extremes.